HCV NS34A protease inhibitors QSAR

Project

Home
HCV

基于多元线性回归和支持向量机的HCV NS3/4A蛋白酶抑制剂的生物活性值的QSAR研究

QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM)

Qin, Z.J.; Wang, M.L.; Yan, A.X.*

Bioorganic & Medicinal Chemistry Letters, 2017, 27, 2931–2938.

本研究使用了多种选择描述符组和训练/测试集的方法，采用多元线性回归(MLR)和支持向量机(SVM) 两种机器学习算法建立定量构效关系(QSAR)模型，以预测丙型肝炎病毒(HCV) NS3/4A蛋白酶抑制剂的生物活性。收集了已报道的512个HCV NS3/4A蛋白酶抑制剂及生物活性IC50值（由相同的FRET法测定得到）构建数据集。应用CORINA Symphony程序计算每个分子的9个全局描述符和12个二维自相关描述符进行表征。采用随机划分和Kohonen自组织映射(SOM)方法将数据集划分为训练集和测试集。最佳的MLR模型对训练集和测试集的相关系数(r2)分别为0.75和0.72，而最佳SVM模型对训练集和测试集的相关系数分别为0.87和0.85。此外，还开发了一系列子数据集模型。结果显示出所有子模型的预测效果均优于原模型。我们认为，将最优子模型和整个数据集的SVM模型进行组合可以作为研发新型NS3/4A蛋白酶抑制剂骨架的可靠先导设计工具。

阅读文章原文

下载原始数据

Download Supporting Information

In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen’s self-organizing map (SOM) method. The correlation coefficients (r2) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline.

Model Name	Algorithm	Descriptors	Spliting methods	Training set r2	Training set sd	Training set MAE	Test set r2	Test set sd	Test set MAE
Model A1	MLR	7 CORINA Global	Random	0.67	0.95	0.73	0.58	1.05	0.76
Model A2	MLR	7 CORINA Global	Kohonen’s self-organizing map (SOM)	0.64	0.98	0.76	0.65	0.95	0.74
Model B1	MLR	7 CORINA Global	Random	0.64	0.98	0.77	0.54	1.10	0.81
Model B2	MLR	7 CORINA Global	Kohonen’s self-organizing map (SOM)	0.60	1.03	0.81	0.63	0.98	0.75
Model C1	MLR	2 CORINA Global 8 CORINA 2D	Random	0.77	0.80	0.62	0.67	0.93	0.65
Model C2	MLR	2 CORINA Global 8 CORINA 2D	Kohonen’s self-organizing map (SOM)	0.75	0.82	0.62	0.72	0.87	0.66
Model D1	MLR	2 CORINA Global 9 CORINA 2D	Random	0.75	0.83	0.65	0.67	0.94	0.68
Model D2	MLR	2 CORINA Global 9 CORINA 2D	Kohonen’s self-organizing map (SOM)	0.73	0.86	0.66	0.72	0.87	0.65
Model A3	SVM	7 CORINA Global	Random	0.80	0.73	0.53	0.72	0.84	0.60
Model A4	SVM	7 CORINA Global	Kohonen’s self-organizing map (SOM)	0.78	0.77	0.55	0.79	0.72	0.56
Model B3	SVM	7 CORINA Global	Random	0.81	0.72	0.52	0.73	0.83	0.58
Model B4	SVM	7 CORINA Global	Kohonen’s self-organizing map (SOM)	0.79	0.75	0.52	0.80	0.73	0.53
Model C3	SVM	2 CORINA Global 8 CORINA 2D	Random	0.90	0.54	0.42	0.75	0.82	0.55
Model C4	SVM	2 CORINA Global 8 CORINA 2D	Kohonen’s self-organizing map (SOM)	0.88	0.56	0.40	0.83	0.68	0.50
Model D3	SVM	2 CORINA Global 9 CORINA 2D	Random	0.90	0.53	0.41	0.81	0.70	0.49
Model D4	SVM	2 CORINA Global 9 CORINA 2D	Kohonen’s self-organizing map (SOM)	0.87	0.59	0.42	0.85	0.63	0.47

QSAR models: Dataset 2 (355 linear inhibitors from dataset1)

Model	Spliting methods	Algorithm	Descriptors	Training set r2	Training set sd	Training set MAE	Test set r2	Test set sd	Test set MAE
Model C2 (for predicting 355 linear inhibitors)	Kohonen’s self-organizing map (SOM)	MLR	2 CORINA Global 8 CORINA 2D	0.74	0.84	0.62	0.68	0.91	0.67
Model LA1	Kohonen’s self-organizing map (SOM)	MLR	2 CORINA Global 8 CORINA 2D	0.77	0.78	0.59	0.77	0.77	0.59
Model D4 (for predicting 355 linear inhibitors)	Kohonen’s self-organizing map (SOM)	SVM	2 CORINA Global 8 CORINA 2D	0.86	0.62	0.44	0.83	0.68	0.49
Model LB2	Kohonen’s self-organizing map (SOM)	SVM	2 CORINA Global 8 CORINA 2D	0.87	0.59	0.43	0.85	0.63	0.45

QSAR models: Dataset 3 (157 macrocyclic inhibitors from dataset1)

Model	Spliting methods	Algorithm	Descriptors	Training set r2	Training set sd	Training set MAE	Test set r2	Test set sd	Test set MAE
Model C2 (for predicting 157 macrocyclic inhibitors)	Kohonen’s self-organizing map (SOM)	MLR	2 CORINA Global 8 CORINA 2D	0.29	0.81	0.60	0.32	0.86	0.62
Model MC1	Kohonen’s self-organizing map (SOM)	MLR	2 CORINA Global 8 CORINA 2D	0.58	0.57	0.41	0.47	0.66	0.47
Model D4 (for predicting 157 macrocyclic inhibitors)	Kohonen’s self-organizing map (SOM)	SVM	2 CORINA Global 8 CORINA 2D	0.60	0.56	0.39	0.55	0.62	0.41
Model MD2	Kohonen’s self-organizing map (SOM)	SVM	2 CORINA Global 8 CORINA 2D	0.76	0.45	0.28	0.67	0.50	0.35

主要项目成员

秦子健

博士研究生

zijianqin@foxmail.com